<!--begin.rcode results='hide', echo=FALSE, message=FALSE
library(caret)
library(pROC)
library(gbm)
library(nlme)
library(mlbench)
library(doMC)
registerDoSEQ()

hook_inline = knit_hooks$get('inline')
knit_hooks$set(inline = function(x) {
  if (is.character(x)) highr::hi_html(x) else hook_inline(x)
  })
opts_chunk$set(comment=NA)

options(width = 150)

session <- paste(format(Sys.time(), "%a %b %d %Y"),
                 "using caret version",
                 packageDescription("caret")$Version,
                 "and",
                 R.Version()$version.string)
    end.rcode-->

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  <!--
  Design by Free CSS Templates
http://www.freecsstemplates.org
Released for free under a Creative Commons Attribution 2.5 License

Name       : Emerald 
Description: A two-column, fixed-width design with dark color scheme.
Version    : 1.0
Released   : 20120902

-->
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta name="keywords" content="" />
  <meta name="description" content="" />
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Adaptive Resampling</title>
  <link href='http://fonts.googleapis.com/css?family=Abel' rel='stylesheet' type='text/css'>
  <link href="style.css" rel="stylesheet" type="text/css" media="screen" />
  </head>
  <body>
  <div id="wrapper">
  <div id="header-wrapper" class="container">
  <div id="header" class="container">
  <div id="logo">
  <h1><a href="#">Adaptive Resampling</a></h1>
</div>
  <!--
  <div id="menu">
  <ul>
  <li class="current_page_item"><a href="#">Homepage</a></li>
<li><a href="#">Blog</a></li>
<li><a href="#">Photos</a></li>
<li><a href="#">About</a></li>
<li><a href="#">Contact</a></li>
</ul>
  </div>
  -->
  </div>
  <div><img src="images/img03.png" width="1000" height="40" alt="" /></div>
  </div>
  <!-- end #header -->
<div id="page">
  <div id="content">
      
 <div id="outcome"></div>        

<p>
Models can benefit significantly from tuning but the optimal values are rarely known beforehand. <span class="mx funCall">train</span> can be used to define a grid of possible points and resampling can be used to generate good estimates of performance for each tuning parameter combination. However, in the nominal resampling process, all the tuning parameter combinations are computed for all the resamples before a choice is made about which parameters are good and which are poor. 
</p>
<p>
<a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a> contains the ability to adaptively resample the tuning parameter grid in a way that concentrates on values that are the in the neighborhood of the optimal settings. See <a href="http://arxiv.org/abs/1405.6974">this paper</a> for the details. 
</p>
<p>
To illustrate, we will use the chemical mutagenicity data from <a href="http://pubs.acs.org/doi/abs/10.1021/jm040835a">Kazius et al (2005)</a>:
</p>
<!--begin.rcode adapt_mutagen
library(QSARdata)
data(Mutagen)

set.seed(4567)
inTraining <- createDataPartition(Mutagen_Outcome, p = .75, list = FALSE)
training_x <- Mutagen_Dragon[ inTraining,]
training_y <- Mutagen_Outcome[ inTraining]
testing_x  <- Mutagen_Dragon[-inTraining,]
testing_y  <- Mutagen_Outcome[-inTraining]

## Get rid of predictors that are very sparse
nzv <- nearZeroVar(training_x)
training_x <- training_x[, -nzv]
testing_x  <-  testing_x[, -nzv]
    end.rcode-->
<p>
Previously, we used this code to tune the model:
</p>
<!--begin.rcode adapt_full,tidy=FALSE,cache = TRUE
fitControl <- trainControl(method = "repeatedcv",
                           number = 10,
                           repeats = 5,
                           ## Estimate class probabilities
                           classProbs = TRUE,
                           ## Evaluate performance using 
                           ## the following function
                           summaryFunction = twoClassSummary)

set.seed(825)
svmFit <- train(x = training_x,
                y = training_y, 
                method = "svmRadial", 
                trControl = fitControl, 
                preProc = c("center", "scale"),
                tuneLength = 8,
                metric = "ROC")
    end.rcode-->
<p>Using this method, the optimal tuning parameters were a RBF kernel parameter of <!--rinline I(round(svmFit$bestTune$sigma, 4))-->  and a cost value of <!--rinline I(svmFit$bestTune$C)-->. To use the adaptive procedure, the <span class="mx funCall">trainControl</span> option needs some additional arguments:
</p>
  <ul>
  <li> <span class="mx arg">min</span> is the minimum number of resamples that will be used for each tuning parameter. The default value is 5 and increasing it will decrease the speed-up generated by adaptive resampling but should also increase the likelihood of finding a good model. </li>
  <li> <span class="mx arg">alpha</span> is a confidence level that is used to remove parameter settings. To date, this value has not shown much of an effect.  </li>
  <li> <span class="mx arg">method</span> is either <tt><!--rinline '"gls"' --></tt> for a linear model or <tt><!--rinline '"BT"' --></tt> for a Bradley-Terry model. The latter may be more useful when you expect the model to do very well (e.g. an area under the ROC curve near 1) or when there are a large number of tuning parameter settings.  </li>
  <li> <span class="mx arg">complete</span> is a logical value that specifies whether <span class="mx funCall">train</span> should generate the full resampling set if it finds an optimal solution before the end of resampling. If you want to know the optimal parameter settings and don't care much for the estimated performance value, a value of <tt><!--rinline 'FALSE' --></tt> would be appropriate here. </li>
  </ul>
<p>
The new code is:
</p>
<!--begin.rcode adapt_gls,tidy=FALSE,cache = TRUE
fitControl2 <- trainControl(method = "adaptive_cv",
                            number = 10,
                            repeats = 5,
                            ## Estimate class probabilities
                            classProbs = TRUE,
                            ## Evaluate performance using 
                            ## the following function
                            summaryFunction = twoClassSummary,
                            ## Adaptive resampling information:
                            adaptive = list(min = 10, 
                                            alpha = 0.05, 
                                            method = "gls",
                                            complete = TRUE))

set.seed(825)
svmFit2 <- train(x = training_x,
                 y = training_y,  
                 method = "svmRadial", 
                 trControl = fitControl2, 
                 preProc = c("center", "scale"),
                 tuneLength = 8,
                 metric = "ROC")
    end.rcode-->    
<p>
These computations were <!--rinline I(round(svmFit$times$everything[3]/svmFit2$times$everything[3], 1))  -->-fold faster than the original analysis. Here, the optimal tuning parameters were a RBF kernel parameter of <!--rinline I(round(svmFit2$bestTune$sigma, 4))-->  and a cost value of <!--rinline I(svmFit2$bestTune$C)-->. These match the previous settings.
<p/>
<p>
Remember that this methodology is experimental, so please send any questions or bug reports to the package maintainer. 
</p>


<div style="clear: both;">&nbsp;</div>
  </div>
  <!-- end #content -->
<div id="sidebar">
<ul>
  <li>
    <h2>General Topics</h2>
    <ul>
      <li><a href="index.html">Front Page</a></li>
      <li><a href="visualizations.html">Visualizations</a></li>
      <li><a href="preprocess.html">Pre-Processing</a><li>
      <li><a href="splitting.html">Data Splitting</a></li>
      <li><a href="varimp.html">Variable Importance</a></li>
      <li><a href="other.html">Model Performance</a></li>
      <li><a href="parallel.html">Parallel Processing</a></li>
    </ul>
    <h2>Model Training and Tuning</h2>
    <ul>
      <li><a href="training.html">Basic Syntax</a></li>
      <li><a href="modelList.html">Sortable Model List</a></li>
      <li><a href="bytag.html">Models By Tag</a></li>
      <li><a href="similarity.html">Models By Similarity</a></li>
      <li><a href="custom_models.html">Using Custom Models</a></li>
      <li><a href="sampling.html">Sampling for Class Imbalances</a></li> 
      <li><a href="random.html">Random Search</a></li> 
      <li><a href="adaptive.html">Adaptive Resampling</a></li> 
    </ul>
    <h2>Feature Selection</h2>
    <ul>
      <li><a href="featureselection.html">Overview</a>
      <li><a href="rfe.html">RFE</a></li>
      <li><a href="filters.html">Filters</a></li>
      <li><a href="GA.html">GA</a></li>
      <li><a href="SA.html">SA</a></li>
    </ul>  
  </li>
</ul>
</div>
<!-- end #sidebar -->
<div style="clear: both;">&nbsp;</div>
  </div>
  <div class="container"><img src="images/img03.png" width="1000" height="40" alt="" /></div>
  <!-- end #page -->
</div>
  <div id="footer-content"></div>
<!--begin.rcode echo = FALSE
knit_hooks$set(inline = hook_inline)    
    end.rcode-->   
  <div id="footer">
  <p>Created on <!--rinline I(session) -->.</p>
  </div>
  <!-- end #footer -->
</body>
  </html>
